Learning to Identify Fragmented Words in Spoken Discourse
نویسنده
چکیده
Disfluent speech adds to the difficulty of processing spoken language utterances. In this paper we concentrate on identifying one disfluency phenomenon: fragmented words. Our data, from the Spoken Dutch Corpus, samples nearly 45,000 sentences of human discourse, ranging from spontaneous chat to media broadcasts. We classify each lexical item in a sentence either as a completely or an incompletely uttered, i.e. fragmented, word. The task is carried out both by the IB1 and RIPPER machine learning algorithms, trained on a variety of features with an extensive optimization strategy. Our best classifier has a 74.9% F-score, which is a significant improvement over the baseline. We discuss why memory-based learning has more success than rule induction in correctly classifying fragmented words.
منابع مشابه
The Relationship between Self-esteem and Conversational Dominance of Iranian EFL Learners’ Speaking
The crucial role of affective factors like anxiety, inhibition, motivation and self-esteem have long been of interest in the field of language learning due to their enormous association with the cognitive processes involved in performance in a second or foreign language. This study aimed at investigating the relationship between Iranian EFL learners’ self-esteem and conversational dominance in ...
متن کاملIdentifying Discourse Markers in Spoken Dialog
In this paper, we present a method for identifying discourse marker usage in spontaneous speech based on machine learning. Discourse markers are denoted by special POS tags, and thus the process of POS tagging can be used to identify discourse markers. By incorporating POS tagging into language modeling, discourse markers can be identified during speech recognition, in which the timeliness of t...
متن کاملClassification of discourse functions of affirmative words in spoken dialogue
We present results of a series of machine learning experiments that address the classification of the discourse function of single affirmative cue words such as alright, okay and mm-hm in a spoken dialogue corpus. We suggest that a simple discourse/sentential distinction is not sufficient for such words and propose two additional classification sub-tasks: identifying (a) whether such words conv...
متن کاملThe Impact of Language Learning Activities on the Spoken Language Development of 5-6-Year-Old Children in Private Preschool Centers of Langroud
The Impact of Language Learning Activities on the Spoken Language Development of 5-6-Year-Old Children in Private Preschool Centers of Langroud N. Bagheri, M.A. E. Abbasi, Ph.D. M. GeramiPour, Ph.D. The present study was conducted to investigate the impact of language learning activities on development of spoken language in 5-6-year-old children at private preschool center...
متن کاملVague Language and Interpersonal Communication: An Analysis of Adolescent Intercultural Conversation
This paper is concerned with the analysis of the spoken language of teenagers, taken from a newly developed specialised corpus the British and Taiwanese Teenage Intercultural Communication Corpus (BATTICC). More specifically, the study employs a discourse analytical approach to examine vague language in an intercultural context among a group of British and Taiwanese adolescents, paying particul...
متن کامل